An Evaluation of Thread Migration for Exploiting Distributed Array Locality
نویسندگان
چکیده
Thread migration is one approach to remote memory accesses on distributed memory parallel computers. In thread migration, threads of control migrate between processors to access data local to those processors, while conventional approaches tend to move data to the threads that need them. Migration approaches enhance spatial locality by making large address spaces local, but are less adept at exploiting temporal locality. Data-moving approaches, such as cached remote memory fetches or distributed shared memory, can use both types of locality. We present experimental evaluation of thread migration’s ability to reduce the impact of remote array accesses across distributed-memory computers. Nomadic Threads uses compiler-generated fine-grain threads which either migrate to make data local or fetch cache lines, tolerating latency with multithreading. We compare these alternatives using various array access patterns.
منابع مشابه
The Thread Migration Mechanism of DSM-PEPE
In this paper we present the thread migration mechanism of DSM-PEPE, a multithreaded distributed shared memory system. DSM systems like DSM-PEPE provide a parallel environment to harness the available computing power of computer networks. DSM systems offer a virtual shared memory space on top of a distributed-memory multicomputer, featuring the scalability and low cost of a multicomputer, and t...
متن کاملJudicious Thread Migration When Accessing Distributed Shared Caches
Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability reasons, designs with high core counts tend towards tiled CMPs with physically distributed shared caches. This naturally leads to a Non-Uniform Cache Architecture (NUCA) design, where onchip access latencies depend on the physical distances between requesting cores and home cores where the data i...
متن کاملMultithreading and Thread Migration Using Mpi and Myrinet
The balance between CPU speed and interconnection network throughput in distributed memory parallel computers varies with each generation of systems, but the trend is that CPUs are gaining performance faster than the interconnection networks. This means that remote data accesses are becoming more expensive relative to local accesses in terms of CPU cycles. Therefore, remote memory access mechan...
متن کاملExploiting Data Locality on Scalable
OpenMP ooers a high-level interface for parallel programming on scalable shared memory (SMP) architectures providing the user with simple work-sharing directives while relying on the compiler to generate parallel programs based on thread parallelism. However, the lack of language features for exploiting data locality often results in poor performance since the non-uniform memory access times on...
متن کاملA Multithreaded CGRA for Convolutional Neural Network Processing
Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CNN acceleration with high energy efficiency and processing performance is efficient data reuse by exploiting the inherent data locality. In this paper, we propose a novel CGRA (Coar...
متن کامل